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Abstract 

Sequential Monte Carlo methods, also known as particle methods, are a widely used set 
of computational tools for inference in non-linear non-Gaussian state-space models. In many 
applications it may be necessary to compute the sensitivity, or derivative, of the optimal filter 
with respect to the static parameters of the state-space model; for instance, in order to obtain 
maximum likelihood model p arameters of interest, or to compute the optimal controller in an 
optimal control problem. In |Poviadii7 et all [20111 ] an original particle algorithm to compute 
the filter derivative was proposed and it was shown using numerical examples that the particle 
estimate was numerically stable in the sense that it did not deteriorate over time. In this paper 
we substantiate this claim with a detailed theoretical study. Lp bounds and a central limit 
theorem for this particle approximation of the filter derivative are presented. It is further shown 
that under mixing conditions these Lp bounds and the asymptotic variance characterized by 
the central limit theorem are uniformly bounded with respect to the time index. We demon- 
strate the performance predicted by theory with several numerical examples. We also use the 
particle approximation of the filter derivative to perform online maximum likelihood parameter 
estimation for a stochastic volatility model. 

Some key words: Hidden Markov Models, State-Space Models, Sequential Monte Carlo, 
Smoothing, Filter derivative. Recursive Maximum Likelihood. 

1 Introduction 

State-space models are a very popular class of non-linear and no n-Gaussian time ser i es models in 
statis ti cs, econometrics and informat ion engineering; see for example Cappe et aL 2005j . Doucet et al 



2001 1 . Durbin and Koopman 200 A state-space model is comprised of a pair of discrete-time 



stochastic processes, {-'^n}„>o and {^}n>0' where the former is an A'-valued unobserved process 
and the latter is a 3^-valued process which is observed. The hidden process {Xn}^^^ is a Markov 
process with initial law dxirg {x) and time homogeneous transition law dx' fg {x'\ x), i.e. 

Xo dxoTTe (xq) and XnliXn^i x„-i) dxnfe {xn\x„-i) , n > 1. (1.1) 
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It is assumed that the observations {Yn}n>o conditioned upon {^)i}„>q are statisticahy independent 
and have marginal laws 



n| ({^fe}fc>o = {xk}k>a) - dynge x„) . (1.2) 

Here ng (x), fg {x\ x') and gg {y\ x) arc densities with respect to (w.r.t.) suitable dominating measures 
denoted generically as dx and dy. For example, if A" C and C then the dominating measures 
could be the Lebesgue measures. The variable 9 in the densities are the particular parameters of 
the model. The set of possible values for 9, denoted 0, is assumed to be an open subse t of R'^. The 



mode l (|l.ip - p.2p is also often referred to as a hidden Markov model in the literature ICappe et al 
2005j . 



For a sequence {2n}„>o ^-i^d integers i, j, let Zi-^j denote the set {zi,Zi^i, ...,Zj}, which is empty 
if j < i. Equations (|1.1[) and (|1.2p define the law of (Xo-n, Yo-n-i) which is given by the measure 



n-l 



dxoTTg (xo) Y[ dxkfg {xk\xk-i) J| dykgg {yk\xk) , (1-3) 



from which the probability density of the observed process, or likelihood, is obtained 

n-l 



/ ft II, _L 

dxong (xo) Y[ dxkfg {xk\xk-i) Y\_ 90 {yk\xk) ■ (1-4) 



k=l k=0 



For a realization of observations Yo-.n-i = Uo-.n-i, let Qe.n denote the law of Xo-^n conditioned on this 
sequence of observed variables, i.e. 



n-l 



Qe,n{dxo;n) = — T-~ T \ dxoTTg {xo)ge {yo\xo) TT dxkfe {xk\xk^i)gg {yk\xk) ) dxnfe (a;„|x„_i) 

PeiyO:n-l)\ J 

Let r]g^n denote the time n marginal of Qe.n- This marginal, which we call the filter, may be computed 
recursively using Bayes' formula: 

/ , \ ^ I J \ dxn+i J 110,71 {dxn) ge {yn\xn) fe { 

r]0,n+i[dxn+i) = Qe.n+1 (dxn+i) = n -^"7^ ; — r~r\ ' n>0 

J Ve,n(dx'Jge{yn\x'J 

and rjg o = ixg by convention. Except for simple models such the linear Gaussian state-space model 
or when A" is a finite set, it is impossible to compute -pg (yo-.n), Qe,™ or rjg^n exactly. Particle methods 
have been applied e xtensively to approx i mate these quantitie s for general state-space models of the 
form (lLl|)-([r2|); see ICappe etHI [2005| . iDoucet et al.l [200l| . 

The particle approximation of Qe,™ is the empirical measure corresponding to a set of iV > 1 
random samples termed particles, that is 



1 ^ 

Ql:^ {dX0:n) = ^ 5] idX0:n) (1-5) 



i=l 



where (dz) denotes the Dirac delta mass located at z. This approximation is referred to as the 
path space approximation Del Mor"al |2004 and it is denoted by the superscript 'p'. The particle 
approximation of rjg^n is obtained from Qg'„ by marginalization 



1 ^ 



1=1 
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These par ticles are propagated in time using importance sampling and resampling steps; see lDoucet et al 



200 ij and lCappe et al.l j200a | for a review of the literature. Specifically, 



sure constructed from TV independent samples from 



lln idX0:n)dXn+lfe {Xn+l\Xn)ge iVn] Xn) 
I '^e',n {dX0:n) 90 {yn\ Xn) 



is the empirical mea- 



(1.6) 



It is a well known fact that the particle approximation of Qg.n becomes pro gressively i mpoverished 



as n increases because of the successive resampling steps jPel Moral and Doucct . ,2003, . Olsson et al 
l2008l |. That is, the number of distinct particles representing the marginal Qg'^ {dxo;k) for any fixed 
k < n diminishes as n increases until it collapses to a single particle - this is known as the particle 
path degeneracy problem. 

The focus of this paper is on the convergence properties of particle methods which have been re- 
cently proposed to approximate the derivative of the measures {r]e.nidxn)}n>o w.r.t. 6 ~ [9i, 



drje 



drjg 



d0d 



(See Section [2] for a definition.) References Cerou et al.l |2001 and Doucet and Tadic |2003 | present 
particle methods which have a computational com plexity th at scales linearly with the number N 
of particles. It was shown in Poviadiis et al. |2011 (see also IPoviadiis et "ah j2009| for a more de- 
tailed numerical study) that the performance of these 0{N) methods, which inherently rely on the 
particle approximations of {Qe,ri}Ti>o constructed as in (jl.6p above, degraded over time and it was 
conjectured that this may be attribut e d to t he particle path degeneracy problem. In contrast, the 



method of IPoviadiis et al 



2005 



alternative method of I Poviadiis et al.l j2005l | was shown in numerical examples to be stable. The 



is a non-standard particle implementation that avoids the parti- 
cle path degeneracy problem at the expense of a computational complexity per time step which is 
quadratic in the number of particles, i.e. 0{N'^)\ see Section |2I for more details. Supported by 



numerical examples, it was conjectured in IPoviadiis et al.l j201lj that even under strong mixing as- 



sumptions, the variance of the estimate of the filter derivative computed with the 0{N) methods 
increases at least linearly in time while that of the 0{N'^) is uniformly bounded w.r.t. the time index. 
This conjecture is co nfirmed in this paper. Specifically, we analyze the 0{N'^) implementation of 
Poviadiis et al. j2005j in Section [3] and obtain results on the errors of the approximation, in partic- 



ular, Lp bounds and a Central Limit Theorem (CLT) are presented. We show that these hp bounds 
and asymptotic variances appearing in the CLT are uniformly bounded w.r.t. the time index when 
the state-space model satisfies certain mixing assumptions. In contrast, the asymptotic variance of 
the 0{N) implementations, which is also captured through the CLT, is shown to increase linearly. 
To the best of our knowledge, these arc the first results of this kind. 

An important application of our results, which is discussed in detail in Section is to the 
problem of estimating the parameters of the model (|l.ip - (|1.2p from observed data. The estimates 
of the model parameters are found by maximizing the likelihood function pe{yo:n) with respect to 9 
using a gradient ascent algorithm which relies on the particle approximation of the filter derivative. 
The results we present in Section [3] have bearing on the performance of the parameter estimation 
algorithm, which we illustrate with numerical examples in Section The Appendix contains the 
proofs of the main results as well as that of some supporting auxiliary results. As a final remark, 
although the algorithms and theoretical results are presented for a state-space model, they may be 
reinterpreted for Feynman-Kac models as well. 
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1.1 Notation and definitions 



We give some basic definitions from probability and operator semigroup theory. For a measurable 
space {E, £) let Ai{E) denote the set of all finite signed measures and P{E) the set of all probability 
measures on E. The n-fold product space £' x • • • x i? is denoted by E". Let B{E) denote the Banach 
space of all bounded real-valued and measurable functions ip : E M. equipped with the uniform 
norm \\(p\\ = sup^^^\(p{x)\. For G M{E) and (p G B{E), let v{(p) = / v{dx) (p{x) be the Lebesgue 
integral of Lp w.r.t. v. If z/ is a density w.r.t. some dominating measure dx on E then, v{lp) = J dx 
i'{x) <fi{x). We recall that a bounded integral kernel M{x,dx') from a measurable space {E,£) into 
an auxiliary measurable space {E' , £') is an operator Lp i— > M{Lp) from B{E') into B{E) such that the 
functions 

X M{ip){x) := M{x,dx')(p{x') 



are 5- measurable and bounded for any if E B{E'). The kernel M also generates a dual operator 
V i-> vM from M{E) into M{E') defined by 

{vM){^) y{M{^)). 

Given a pair of bounded integral operators {Mi,M2), we let {M1M2) the composition operator 
defined by (MiM2)(^) = Mi{M2{'p)). 

A Markov kernel is a positive and bounded integral operator M such that Mil) ix) — 1 for any 
x G ^. For G B{E), let 

0Sc((y5) = sup \ip{x) — '~p[x')\ 

and let 

OfiCi{E) = {fe B{E) : osc(v3) < 1}. 

Let P( M) G [0, 1] denote the Dobrushin coefficient of the Markov kernel M which is defined by the 
formula |Del Morall [2004L Prop. 4.2.1]: 

/3(Af) sup {osc{M{ip)) ; ip G Osci 

If there exists a positive constant p such that the Markov kernel M satisfies 

M {x, dz) > pM {x, dz) for all x,x' ^ E then /3 (A/) < 1 - p. 

For two Markov kernels Mi.Nh, PiMiA'h) < /3(Afi)/3(M2). 

Given a positive function G on i?, let : G ^^(i?) ^G(i^) G ^(^) be the probability 
distribution defined by 

^ , , s v{dx)G{x) 

provided 00 > v{G) > 0. The definitions above also apply if is a density and Af is a transition den- 
sity. In this case all instances of v{dx) should be replaced with dxv{x) and M{x, dx') by dx'M{x, x') 
where dx and dx' is generic notation for the dominating measures. 
It is convenient to introduce the following transition kernels: 

Qe,7i{Xn^l,dXn) = ge{yn-l\Xn-l)dXnfe{Xn\Xn^l) = dXnqe{Xn\Xn-l) , ^ > 0, 

Qe,k,n{xk,dxn) = iQe,k+iQe,k+2 ■ ■ ■Qe,n) {xk,dxn), < k < n, 

with the convention that Qe,n.n = Id, the identity operator. Note that Qe,fc,n(l) (xk) is the density 
of the law of Yk-.n-i given Xk = Xk- For < p < n, define the potential function Ge^p,n on X to be 

Ge,p,nixp) = Qe,p,n(l)(a:^p)/??e,pQe,p,n(l)- (1-7) 
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Let the mapping ^e,k,n ■ 'P{X) — > V{X), < fc < n, be defined as follows 

vQe,k,n{dxn) 



It follows that rje^n = ^e,k,7i{vo,k)- For conciseness, we also write ^e.n~i.n as ^g^n. 

A key quantity that facilitates the recursive computation of the derivative of 77e,n is the following 
collection of backward Markov transition kernels: 

Mg^n{Xn,dx„-i) ^ — r-- , n > 0. (1.8) 

Vd,n-l{qe[Xn\-)) 



Their particle approximations are 



'?^„ _lidXn-i)qeiXn\Xn^l) 



These backward Markov kernels are convenient for computing certain conditional expectations and 
probability measures. In particular, for (p £ B{X^), we have 

/ Mg^n{Xn,dx,i^i)ip {Xn-l,Xn) , 

and the law of Xo-.n-i given X„ = Xn and Ybin-i = Vo-.n-i is Mg ^nixn,dxn-i) ■ ■ ■ Me,i(a;i, dxo). 

Finally, the following two definitions are needed for the CLT of the particle approximation of 
the derivative of 779 „. The bounded integral operator Dg ^ „ from X into is defined for any 

G BiX^+^) by ' 



Dg,k,n{Fn){Xk) 



/ n ^'^s,j{xj,dxj-i) Y\_ Qe.j+i{xj,dxj+i) F„(a:o:„), < fc < n. 



(1-10) 

with the convention that 110 = 1. The particle approximation, -D^j, „, is defined to be 



DZuAFu){xk) 



J (^^KM'dx,^,)^ (^^Qg.j+i{xj,dx,+,)^ F^{xo..n). (l-ll) 



To be concise we write 

■r]g^kidXk)Dg^k.nixk,dxo:k-l,dXk+l:„) as rig^kDg^k,n{dXQ:n)- 

(And similarly for the particle versions.) Although convention dictates that 'qe^kF)g_k.n should be 
understood as the measure {rig^kF>0,k,n){dxo:k-i,dxk+i:n), when we mean otherwise it should be 
clear from the infinitesimal neighborhood. 
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2 Computing the filter derivative 

For any F„ e B{X'"^'^), wc have 



Iff" \ 

'°--"^^> J V k=l fc=0 / 

// n n— 1 \ 

(ia;o:„V TTg (xq) /e {xk\ Xk-i) Y\_ 9e {yk\ Xk) 

\ k=l k=Q ) 



-Ee{F„(Xo:„)|yo. 



where 



V6 (yo:n-i) 

lj'E.0 {To, niXo;n)\ yO:n-l} 

n 

Te,nixo;n) = ie,k{xk~i,Xk) 

k=0 

te,k{xk-i,xk) =V \og{ge {vk-i\xk-i) fe {xk\xk-i)) , k > 0, 
<e,o(a;-i,xo) = tgfl{xa) = VlogTrg (a;o) . 



(2.1) 



(2.2) 

(2.3) 
(2.4) 



The first equahty in (|2.ip follows from the definition of Qe.n and interchanging the order of differ- 
entia tion and integration. The interchange is permissible under certain regularity conditions 
Il996| : e.g. a sufficient condition would be the main assumption in Section [3] under which the uni- 
form stability results are proved. The second equality follows from a change of measure, which 
then permits an importance sampling ba sed estimator for the derivative of Qe.n] this is the well 
known score method, e.g. see iPflua 19961 Section 4.2.1]. For any ipn S it follows by setting 

FnixO:n) = y^n{Xn) in that 



V / Vd,n {dx„)(pn{Xn) 

= Eg {ipn{X„)Tg,„iXo.,n)\ yO:ri-l} - Eg {(p„(X„)| yQ,n-l} Eg {Tg ,n{Xo:„)\ Va-.n-l} 
n \Xn } 



where 



] - Eg [Tg^n (^0:n) I J/0:«-l]) • 



(2.5) 



We call Ce,n the derivative of rjg^n- 

Given the particle approximation (|1.5p of Qe,„, it is straightforward to construct a particle ap- 
proximation of Ce,n- 



ee:nidXn) =J2n\ TeAX^l) " ^ E ^''^"(^O^-) ^X<" ^^Xn) ■ 



(2.6) 



This approx imation is also refer red t o as the path space method. Such approximations were implicitly 
proposed in Cerou et al. 2001 and Doucet and Tadic |2003 | and there are several reasons why this 



estimate appears attractive. Firstly, even with the resampling steps in the construction of 
C^'^ can be computed recursively. Secondly, there is no need to store the entire ancestry of each 
particle, i.e. l-'^^o'^l , and thus the memory requirement to construct C,g'^ is constant over 



6 



time. Thirdly, the computational cost per time is 0{N). However, as Qg'^ suffers from the particle 

path degeneracy problem, we exp ect the approximation Ce worsen over time. This was indeed 
observed in numerical examples in Poviadiis et al. 20 111 ] and it was conjectured that the asymptotic 
variance (i.e. as iV — > oo) of Q^'^ for bounded integrands would increase linearly with n even under 
strong mixing assumptions. This is now proven in this article. 



A n alter native particle method to approximate {Ce,n}n>o has been proposed in iPoviadiis et al 



2005ll201lj . We now reinterpret this method using the representation in (|2.5p and a different particle 



approximation of Qe,n that avoids the path degeneracy problem. 
The measure Qe^n admits the following backward representation 



and the corresponding particle approximation of Qe,n is given by 



where M^j. was defined in (|1.9I) . This now gives rise to the following particle approximation of C,o^n 



Poviadiis' et all l2005l [20T1 



^e,n[Vn) = j Q^„(da;o:„)Te,„(a;o:„) (<y5„(a;„) - Ve^,niVn)) 

and indeed rj^nifn) = J Qg'nidxo:n)'fn{xn)- It is apparent that constructed using this backward 
method avoids the degeneracy in paths. It is even possible to compute recursively as detailed 
in Algorithm 1; since a recursion for 770, « is already available, it is apparent from (|2.5p that what 
remains is to specify a recursion for Eg [Tg^n iXo;n) \ Vo-.n-i, Xn]- Let Tg^n{xn) denote this term, then 
for n > 1, 

Te.n{Xn) = Eg [Tg^n (-'^0 :n ) I yO:n— 1 ; 

yO:n—l ; 

Mg^n{Xn,dXn-l) (Efl [Te,„_i (Xo:„_l)| y0:7i — 2i '^7i—l\ \ i^O,n 
Mg^n{x„,dXn-l) {T g ,n-l{Xn-l) +tg^n (a;„_i,a;„)) 

where Tg^^xo) = tgfi{xo). Algorithm 1 computes recursively in time by computing (Te.m^ye.n) 
and is initialized with T^'q = tg^ol-'^o*'') (^^^ (|2.2p ) where IaTq*''! are samples from 7re(a-o)- 

L J l<i<N 

Algorithm 1: A Particle Method to Compute the Filter Derivative 

• Assume at time n — 1 that approximate samples \ xi^K } from rig „_i and approximations 

L J i<i<jv 

{r'g iA of (xl;l^ I are available. 

L ' J l<i<Ar L V / J l<t<Ar 

• At time n, sample ■! Xn'^ \ independently from the mixture 

L J l<i<Ar 



^" I -^n- 1 



2/n-ll4-l 



(2.7) 



and then compute 



<i<N 



and as follows: 



I y(J') 



I ^ ( 1 ^ \ 



(2.8) 



(2.9) 



Algorithm 1 uses the bootstrap particle filter of Gordon et al. (l993| . Note that any SMC imp le 



'itt and Shephar^ 



MUunpii 
iJl999l | ( 



Doucet et al.L l2001i . It was 



mentation of {'qe,n}n>o may be used, e.g. the auxiliary SMC method of|^ 
sequential imp ortance resamp l ing w ith a tailored proposal distribution 

conjectured in lPoviadiis et al" 2011 1 that the asymptotic variance of Ce^„(<y5) for bounded integrands 
Lp is uniformly bounded w.r.t. n under mixing assumptions. This is established in this article. 



3 Stability of the particle estimates 

The convergence analysis of C^^^ (and (^^'^ for performance comparison) will largely focus on the 

convergence analysis of the A^-particle measures Q^„ (and correspondingly Qg'^) towards their 
limiting values Qe.n, as iV — >■ oo, which is in turn intimately related to the convergence of the flow of 

particle measures < rj^^ > towards their limiting measures {'7e,n}„>o. The error bounds and the 

L ' J n>0 ' — , , 



centra l limit theorem presented here have been derived using the techniques developed in iDel Moral 



2004j for the convergence analysis of the particle occupation measures rjg^ . One of the central 



objects in this analysis is the local sampling errors defined as 

Ve'^.n - ViV «„ - '^eA<n-i)) (3.1) 

The fluctuation and the deviations of these centered random measures can be estimated using non- 
asympto tic Kintchine's type Lr-inequalities, a.s well as H oeffding's or Bernstein's typ e exponential de- 
viations [Del Mora]ll2004ilDel Moral and Riol . l2009t . In lOel Moral and Miclol |2000l | it is proved that 



these random perturbations behave asymptotically as Gaussian random perturbations; see Lemma 
17.101 in the Appendix for more details. In the proof of Theorem 17.111 (a supporting theorem) in 
the Appendix we provide some key decompositions expressing the deviation of the particle measures 
Q^„ around its limiting value Qe,n in terms of the local sampling errors (V/(), . . . , V^^^). These de- 
compositions are key to deriving the L^-mean error bounds and central limit theorems for the filter 
derivative. 

The following regularity conditions are assumed. 

(A) The dominating measures dx on X and dy on y are finite, and there exist constants < 
p,S,c < oo such that for all (x, x' ,y,9) £ x y x Q, the derivatives of 7re(a;), fe {x'\x) and ge {y\x) 
with respect to 9 exists and 

P~' <f0{x'\x)<p, <geiy\x)<S, (3.2) 

iVlogTTe {x)\ V |Vlog/e ix'\x)\ V \Vlogge iy\x)\ < c. (3.3) 

Admittedly, these conditions are restrictive and fail to hold for many models in practice. (Exceptions 
would include applications with a compact state-space.) However, th ey are typically made to estab- 
lish t he time uniform stability of particle approximations of the filter |Del Mora]Ll2004LICappe et al 



2005l | as they lead to simpler and more transparent proofs. Also, we observe that the behaviors pre- 



dicted by the Theorems below seem to hold in practice even in cases where the state-space models 
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do not satisfy these assumptions; see Sectional Thus the results in this paper can be seen to provide 
a quahtative guide to the behavior of the particle approximation even in the more general setting. 

For each parameter vector 9 € Q, realization of observations y ~ {yn}n>o f^nd particle number 
A^, let {n,T,Pg) be the underlying probability space of the random process {{Xn \ ■ ■ ■ , Xn^^)}n>o 
comprised of the particle system only. Let Eg the corresponding expectation operator computed 
with respect to ¥g. The first of the two main results in this section is a time uniform non-asymptotic 
error bound. 

Theorem 3.1 Assume (A). For any r > 1, there exists a constant Cr such that for all 6 Cz 0, 
y = {yn}n>o, n>0,N>l, and (y9„ € Osci{X), 

1 

Let {Ve.n}n>o be a sequence of independent centered Gaussian random fields defined as follows. 
For any sequence {(pn}n>a in B{X) and any p > 0, {Ve.ni^n)}n=o ^ collection of independent 
zero-mean Gaussian random variables with variances given by 

mAfl) ~ mAvnf- (3-4) 

Theorem 3.2 Assume (A). There exists a constant C < oo such that for any 6 Cz Q, y ~ {yn}n>o, 
n > and ipn G Osci{X), \fN (^Ce^n " Ce,nj ifn) converges in law, as N ^ oo, to the centered 
Gaussian random variable 

l,^^-" ) (3.5) 

whose variance is uniformly bounded above by C where 
The proofs of both these results are in the Appendix. 

As a comparison, we quantify the variance of the particle estimate of the filter derivative computed 
using the path-based method (see (|2.6p .') Consider the following simplified example that serves to 
illustrate the point. Let gg {y\ x) ~ g {y\ x) (that is 0- independent), fg (x„| Xn-i) = 7rg{xn), where 
TTg is the initial distribution. (Note that fg in this case satisfies a rephrased version of (j3.2|) under 
which the conclusion of Thcorem l3.2l also holds.) Also, consider the sequence of repeated observations 
2/0 = yi = ■ ■ ■ where yo is arbitrary. Applying Lemma I7.f 21 (in the Appendix) that characterizes the 
limiting distribution of VN{Qg'n ~ Qe,n) to this special case results in V^iCg'n ~ Ce,n){'p) (see 
(|2.6p ) having an asymptotic distribution which is Gaussian with mean zero and variance 

n X Trg{Tp^yg [(VlogTre)^] + ng logirgf] - WTrg{ipf 

where = ip — Trg(ip), TTg{x) = 'ng{x)g (yo| x) /irg {g (yol •))■ This variance increases linearly with time 
in contrast to the time bounded variance of Theorem 13.21 



4 Application to recursive parameter estimation 



Being able to compute {C,e,n\n>o is particularly useful when performing online static p arameter esti- 



mation for state-space models using Recursive Maximum Likelihoo d (RML) techniques |Le Gland and Mevel 



19971 . IPoviadiis et aD . l2005l l201lj : see also iKantas et al.l |2009j for a general review of available 
particle methods based solutions, including Bayesian ones, for this prqblern . The computed filter 



derivative may also be useful in other areas; e.g. 
control. 



see 



Coauelin et al. 2008| for an application in 
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4.1 Recursive Maximum Likelihood 



Let 6* be the true static parameter generating the observed data {yn\n>o- Given a finite record of 
observations yoiT, the log- likelihood may be maximized with the following steepest ascent algorithm: 



9fc = dk-i +7A; Vlogpe(yo:T)le=e^_^ , 



k > 1, 



(4.1) 



where Oq is some arbitrary initial guess of 9*. V logpe(yo:T)|g=gj. ^ denotes the gradient of the 
log-likelihood evaluated at the current parameter estimate and {7fe}fc>i is a decreasing positive 
real-valued step-size sequence, which should satisfy the following constraints: 



fc=i 



oo, 



fc=l 



Although V logpe(yo:T) can be computed using (|4.3p . the computation cost can be prohibitive for 
a long data record since each iteration of (|4.ip would require a complete browse through the T -I- 1 
data points. A more attractive alternative would be a recursive procedure in which the data is run 
through once only sequentially. For example, consider the following update scheme: 



On = 6*11-1 +7„ Vlogpe(j/„|?/0:«-l)l6 



(4.2) 



where \/ logpe{yn\yo:n-i)\g^g _j denotes the gradient of logpe(y„|?/o:n-i) evaluated at the current 
parameter estimate; that is upon receiving y„, is updated in the direction of ascent of the 

conditional density of this new observation. Since we have 



Vlogpe(yn|yo:i 



/ dxnr]e„^i,nixn) Vge (2/n| a;n)le„_i + / ^^^n {yn\ Xn) Ce„-i,n{xn)ge„ 

J dXnr]e„^i,niXn)ge„^i {yn\Xn) 



(4.3) 

this clearly requires the filter derivative Ce,n- The algorithm in the present form is not suitable 
for online implementation as it requires re-computing the filter and its derivative at the value 9 = 
9n-i from time zero. The RML procedure uses an approximation of (j4.3p which is obtained by 
up dating the filter and i ts der ivative using the parameter value 9n-i at time n; we refer the reader 
to Le Gland and Mevell |l997t for details. The asymptotic properties of the RML algorithm, i.e. 
the behavi or of 9n in the limi t as n goes to infinity, has been studied in the case of an i.i.d. hidden 
process bv lTitterington 19841 and Le Gland and Mevel (l997 for a finite state-space hidden Markov 
model. It is shown in iLe Gland and Mevell {1991 1" that under regularity conditions this algorithm 



converges towards a local maximum of the average log-likelihoo d and that this average log- likelihood 
is maximized at 6**. A particle version of the RML algorithm of lLe Gland and Mevell |1997l | that uses 
Algorithm I's estimate of 779 „ is presented as Algorithm 2. 



Algorithm 2: Particle Recursive Maximum Likelihood 

• At time n — 1 we are given yo-n-i, the previous estimate 9n-i of 9* and {{X^^^j^,Tll'_i)}^^. 

• At time n, upon receiving ?/„, sample | X^^H independently from (12.70 using parameter 
9 — 9n~i to obtain 

1 " 



i=l 
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and then compute 



-(•0 "^r- 



T" = - ^ ^ ^ ^ ^ ^ ^ (4 4) 



and 



Vlogp(y„|yo:„_i) 
Finally update the parameter: 



]^E^" (4.5) 



I -^n ) 

I -^n) 

9„ 0„_i +7„Vlogp(y„|yo:n-i) ■ (4.6) 



Under Assumption A, the particle approximation of the fiher is stable (Del Mor^ . 2004 1: see also 



Lemma 17.41 in the Appendix. This combined with the proven stability of the particle approximation 
of the filter derivative implies that the particle estimate of the derivative of logp (y„| yo-.n-i) is also 
stable. 

4.2 Simulations 

The RML algorithm is applied to the following stochastic volatility model Pitt and Sheohardl . 1999| : 

1-^2 

r„ =/?exp(X„/2)iy„, 



where J\f{m,s) denotes a Gaussian random variable with mean m and variance s, Vn A^(0, 1) 

and Wn - TV (0,1) are two mutually independent sequences, both independent of the initial state 
Xq. The model parameters, 9 = (0, cr, /3), are to be estimated. 

Our first example demonstrates the theoretical results in Section [S] The estimate of d/da 
logj3 ( j/ri:ri+L-i| 2/0:n-i) at 9* = (0.8, VO.!, 1) was computed using Algorithm 1 with 500 parti- 
cles and using the path-space method (sec ^2.6^ ) with 2.5 x 10^ particles for the stochastic volatility 
model. The block size L was 500. Shown in Figure [T] is the variance of these particle estimates 
for various values of n derived from many independent random replications of the simulation. The 
linear increase of the variance of the path-space method as predicted by theory is evident although 
Assumption A is not satisfied. 

For the path-space method, because the variance of the estimate of the filter derivative grows 
linearly in time, the eventual high variance in the gradient estimate can result in the divergence of the 
parameter estimates. To illustrate this point, (|4.6p was implemented with the path-space estimate of 
the filter derivative (|2.6p computed with 10000 particles and constant step-size sequence, 7„ = 10"'' 
for all n. 9o was initialized at the true parameter value. A sequence of two million observations was 
simulated with 9* = (0.8, VOA, 1). The results are shown in Figure [3l 

For the same value of 9* and sequence of observations used in the previous example. Algorithm 
2 was executed with 500 particles and 7„ = 0.01, n < 10^, 7„ = (n - 5 x 10"')"°-^, n > 10^. As it 
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_^ 8 o 



Figure 1: Variance of the particle estimates of d/dalogp {yn:n+500-i\ Vo-.n-i) for various values of n 
for the stochastic volatility model. Circles are variance of Algorithm I's estimate with 500 particles. 
Stars indicate the variance of the estimate of the path-space method with 2.5 x 10^ particles. Dotted 
line is best fitting straight line to path-space method's variance to indicate trend. 




Figure 2: Sequence of recursive parameter estimates, 6'„ = ((T„, (^„, /?„), computed using (|4.6p with 
A'' = 500. From top to bottom: /3„, </>„ and (t„ and marked on the right are the "converged values" 
which were taken to be the empirical average of the last 1000 values. 
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Figure 3: RML for stochastic volatility with path-space gradient estimate with 10,000 particles, 
constant step-size and initialized at the true parameter values which are indicated by the dashed 
lines. From top to bottom, 0, /3 and a. 



can be seen from the results in Figure [5] the estimate converges to a value in the neighborhood of 
the true parameter. 



5 Conclusion 



We have presented theoretical results establi shing the uniform stab i hty of the particle approximation 
of the optimal filter derivative proposed in IPoviadiis etahl {20051 12009{ . While these results have 
been pr esented in the c ontext of state-space models, they can also be applied to Feynman-Kac 
models jPel Moral l2004l | which could potentially enlarge the range of applications. For example, if 
dx' fg {x'\ x) is reversible w.r.t. to some probability measure fie and if we replace gg (?/„| a;„) with 
a time-homogeneous potential function gg {x„) then rjg^n converges, as n — > cxj, to the probability 
measure fig^h defined as 



fJ'e,h{dx) 



flg{hg J dx' fg [x'\ ■) hg{x')) 



flg{dx) hg{x) I dx' fg {x'\x) hg{x') 



where hg is a positive eigenmeasure associated wit h the top e igenva lue of the integral operator 
Qg{x,dx') = gg{x)dx' fg {x'\x) (see section 12.4 of Del Moral 2004j ). The measure fig^u is the 
invariant measure of the /i-process defined as the Markov chain with transition kernel Mg (x, dx') cx 
dx' fg {x'\ x) hg{x'). The particle algorithm described here can be directly used to approximate the 
derivative of this invariant measure w.r.t to 0. It would also be of interest to weaken Assumption A 
an d there are several ways this migh t be approached. For exam ple for non-ergodic signals using ideas 



m 
in 



Qudiane and Rubenthale r '2005|^ Heine and Crisan 2008| or via Foster-Lyapunov conditions as 



Beskos et al.l j201lj . IWhitelev ^201 1 | 
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7 Appendix 

The statement of the results in this section hold for any 9 and any sequence of observations y = 
{yn\n>o- AH mathematical expectations are taken with respect to the law of the particle system only 
for the specific and y under consideration. While 9 is retained in the statement of the results, it is 
omitted in the proofs. The superscript y of the expectation operator is also omitted in the proofs. 
This section commences with some essential definitions in addition to those in Section ll.il Let 

}0,k,n{xk,dXn) 



Pe,k,n{xk,dXn) 



and 



Me,p{xp,dxo;p^i) = Y\_ Me.k{xk,dxk-i), p > 0, 

k—p 

and its corresponding particle approximation is 



M0^{xp,dxo:p^i) = Y[ Mgj,{xk,dxk-i) 

k—p 

To make the subsequent expressions more terse, let 

rie^n = *e,n(f?^„-i), ?^ > 0, 
where rj^^ = $e,o(??^i) = Vefi = t^9 by convention. (Recall $0,„ = $e,n-i,ri-) Let 

J"^ = cr < fc < n, 1 < i < Arj) , n > 0, 



(7.1) 



be the natural filtration associated with the A^-particle approximation model and let J-_i be the 
trivial sigma field. 

The following estimates are a straightforward consequence of Assumption (A). For all 9 and time 
indices < k < q < n, 



9,k,n 



QeM,q{Xk,dXq)Qe^q^„{l){Xq) 
Qe,k,q{Qe,q,n{^)){Xk) 



and for 6*, < fc < g, 

A4^,(x, dz) < M^^,{x\ dz) =^ p (M,^, . . . M,^,) < (1 - p-y-"-"-' 



(7.2) 
(7.3) 



Note that setting q = n in (|7.2p yields an estimate for /3(Pe,fc,n) 

Several auxiliary resu lts are now prese nted, all of which hinge on the following Kintchine type 
moment bound proved in lDel Moral |2004L Lem. 7.3.3]. 

Lemma 7.1 Del Moral \200A . Lemma 7. 3. 3] Let fi be a probability measure on the measurable space 
{E, £). Let G and h be £ -measurable functions satisfying G{x) > cG{x') > for all x,x' € E where c 
is some finite positive con stant. Lei{XW} i<i<N be a collection of independent random samples from 
jjL. If h has finite oscillation then for any integer r > 1 there exists a finite constant Or, independent 
of N , G and h, such that 



Ef=iG(xW)M^ 



ELg(xW) 



lijGh) 



< c ^osc{h)ar. 
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Proof: 

The result for G = 1 and c = 1 is proved in lDel Morall 200J]. The case stated here can be estabhshed 
using the representation 



M^(G) m(G) m^(G) 
where fi^{dx) = N'^ -^f^, Sxi.)idx). 



AG) \ KG) ) 



Remark 7.2 For k > 0, let h^_i be a J^^_i measurable function satisfying h^_i € Osci{X) almost 



< C Qr 



surely. Then Lemma \7.1\ can be invoked to establish 

vUGhti) ^eAv^.k-i)iGh^-i) 



V^AG) ^eM^k-i)iG) 

where G is defined as in Lemma \7.1\ 

Lemma [7.31 to Lemma [7^ are a consequence of Lemma [7TT1 and the estimates in (|7.2p . 

Lemma 7.3 For any r > I there exist a finite constant such that the following inequality holds 
for all 0, y, < k < n and J'l^_i measurable function (p^ satisfying ip^ g Osci{X) 
almost surely, 



NEy(^\<^e,kAvlk){Vn)-^e,k-iAve,k-i)i^n)\'' ) " < a. bg,„,„ p {Pe,k,n) , 
where, by convention $e -i niVe^-i) = 'He n, o,nd the constants bg ^ n o-nd (3 {Pg k n) were defined in 



Proof: 



^k,n{v^:){^:)-^k-lAv'^-M) 

' {dxk)Qk.na)ix,) '^kivi!-i){dxk)QkA'^)ixky 



<i>kK-i)QkAi) 



PkAv'Aixk 



where ^o{'r]^i) — ?yo by convention. Applying Lemma |7. II with the estimates in (|7.2p we have 



VNE[\^kAv^){ip^)^^k-iAnk-A^n)\^ I -Ff-i)' <a. 6fe,„ /3(P,,„) 



almost surely. 



Lemma [731 mav be used to derive the following error estimate [Del Moralll2004 Theorem 7.4.4]. 



Lemma 7.4 For any r > 1, there exists a constant Cr such that the following inequality holds for 
all 0, y, n > and ip G Osci(X), 



NEg (\[Vg,n - VeA^)]) " < Cr ^ be,k,n P {P0,kA ■ 



(7.4) 



fe=0 
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Assume (A). For any r > 1, there exists a constant c'^ such that for all 0, y, n > 0, ip G Osci{X), 
G G B{X) such that G is positive and satisfies G{x) > cgG{x') for all x,x' € X for some positive 
constant cq, 



V0,„{dXn)G{x„) r]g^„{dXn)G{Xn) 



Ve.n{G) 



(^) 



(7.5) 



Proof: 

The first part follows from applying Lemma 17.31 to the telescopic sum Del Moral |2004 Theorem 
7.4.4]: 

n 
fc=0 

with the convention that $_i,„(77:'^]^) ~ r/„. For the second part, use the same telescopic sum but 
with the fc-th term being 

1>.,„«)(G) <i>,-iAv^_,){G) 

' vjf(dxk)QkAG){xk) _ 'fk{vJ!^i){dxk)QkAG){xk) \ Qk,n{Gv){xk) 

/ Qk,n{G){xk) ■ 



VkQkAG) 



Mvj:-i)Qk,n{G) 



Apply Lemma 1 7. II using the same estimates in (j7.2p . i.e. the same estimates hold with G replacing 
1 in the definition of b^^n and with G replacing (5g,„(l) in the argument of (3. 
The following result is a consequence of Lemma 17.41 



Lemma 7.5 Assume (A). For any r > 1, there exists a constant Cr such that the following inequality 
holds for all 9,y,0<k<n, N>0 and ipn G Osci{X), 



Proof: 

The result is established by expressing ^k,n{ilk) as 



^kAVk){dXn) = 



rij; {dxk)QkA'^)ixk) 
V^QkA^) 



expressing ^k,nivk) similarly, setting G in (j7.5p to Qfe.„(l), ip = PkAfn) and using the estimates in 



Lemma 7.6 For each r > 1, there exists a finite constant Cr such that for all 9 , y , < k < q < n, 
and measurable functions tf^ satisfying tpq ^ Osci{X) almost surely, 



Nl 



*e,A;,9(?y^fc)(rfa;g)Q0,9,n(l)(a;g) ^e,k-i,q{'nS^k-i)idxg)Qe,qA'^)ixq) 



< Cr b, 



e,k' 



*e,fe,9(<fc)Qe,9,„(l) 

e,k,qixk,dXg)Q0,qA^)iXq) 



*e,A,-l,g(<fc_l)Qe,,,n(l) 



'}e,k,q{Qe,qA^))i^k) 
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Proof: 

This results is established by noting that 

*fc,g(r/f)(da;,)g,,„(l)(a;,) $fc-i,,(r;f_i)(da::,)Q,,„(l)(x,) 



VkQkA^) ^k{Vk^i)QkA^) ) Q&,n(i)(xfc) 



Now Lemma rOI is applied using the estimates in ()7.2p . 

Lemma 7.7 Assume (A). There exists a collection of a pair of finite positive constants, ai,Ci, i>l, 
such that the following hounds hold for all r > 1, 6, y, < p < n, N > 1, Xp £ X , Fp G B{X^'^^), 



NK [ \M^,p {Fp{-^ ^p)) (a^p) " Me,p {Fp{., Xp)) {xp)\ 



< \\Fp\\ a-rp, 



iVE^ 



{\Df,pAFn){xp) - De.pAFn){xp)[ 



Proof: 



For each Xp, let xo:p-i Gp-i^xp{xvi:p-i) — Fp{xQ.p)l{^p\^p-i) ■ Adopting the convention t^q = r/o, 



M; {Fp{., Xp)) (xp) - Mp Xp)) (xp) 



E 

fe=i 

p 

E 

k=l 



Vp-kDp-k,p-iidxo:p-i)q{xp\xp-i) ri^_^D^_^p_j^{dxo:p-^i)qixp\xp-i) 



ri^-kD^-k^p-M^pD) 



V^-kD^-k.p-Ali^p\-)) 



Fp{xo.,p) 



ri^_j,{dxp-k)Qp~k^p-i[q[xp\.)){xp-k) ri^_k{.dxp-k)Qp^k,p-i(.q{.Xp\.)){xp-k 



Vp-kQp-k,P-i{qixp\-)) 

^p-k,p-l,Xp i^p-k) 



Vp-kQp~k^p~iiqi^p\-)) 



where G^_fc,p_i,^ 
norm 



'p-k,p^iiq{xp\.)){xp^k) 

^(xp-k) = D^_^. p_^{Gp-i^xp){xp-k), which is a J^J^j.^^^ -measurable function with 

^p-k,p-l,Xp i^p-k) 



sup 



Qp-k,p-iiq{xp\.)){xp-k) 



< \\F„ 



The result is established upon applying Lemma 17.11 (see Remark 17.21) to each term in the sum 
separately and using the estimates in (|7.2p . To establish the second result, let 

Fp,n{xQ-_p) — / Qp^i(^Xp ^ dXp-^i) • • • Qn{Xn—l : dXn)Fji(^XQ:n) ■ 



Then, 

DpAFn){Xp) - DpAFn){Xp) = Mp {FpA;Xp)) (Xp) - Mp {FpA; Xp)) (Xp). 

The result follows by setting c„ = psupg ||Qe.p.ri(l)|| and it follows from Assumption (A) that c„ is 
finite. 

Lemma 17.81 and Lemma 17.91 both build on the previous results and are needed for the proof of 
Theorem 13.11 
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Lemma 7.8 Assume (A). For any r > I there exists a constant Cr such that for all 9, y, < k < n, 
N >1, ipn e Osci{X), 



< 2(n - fc)ap""'' 

Proof: 

The term (j7.6l) can be further expanded as 



tg^k (Xk-l,Xk) (Pn[Xn) nN TTT 



(7.6) 



Vk^k (1) ^''M ^NnN n\ 



{dX0:n)tk (Xfe_i,Xfc) ((y5„(x„) - T]^ (ifin)) 



^^P'^-^I) tk[Xk^,,Xk)yPn[x^) ^MDN^^^^ 



p=k 

n-1 

-E 

p—k 

n-1 

-j: 

p—k 

n-1 

-E 

p—k 

n-1 

-E 

p—k 

n—1 . 

p—k 



Vp-j-l^p^l^ni^^O-.n) 



tk {xk-l,Xk) ipn{Xn) 



<+li^p^+l,„(l) 



„N^)N^^\ n'^.nN^ (^\ ] tk[Xk-i,Xk) ^fn[Xn) r]^D^„{l) 



, <^p^„(l) <+l^p^+l,„(l) j 1, 

( Vp Dp,nidxO:n) _ Vp+lDp+l,nidX0:n) \ 



X 1^4. (..-...) -^^^^^ 



(7.7) 



(7.8) 



For the first equality, note that rj^ ^^{dxo.n) — Qn (dxo-.n)- It is straightforward to estabhsh that 

ri^D^JdX0:n)/Vp igiVpl •)) = V^+,D^+iJdxO:n), (7.9) 
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which is due to 



Vp{dxp)g{yp\xp)f{xp+i\xp)dxp+ir]^{g{yp\-)f{xp+i\-)) " ^ 



n-l 

Mp+i{xp+i,dxp)?}^+j^{dxp+i) Y[ Qj+ii 

Xj, dxj-f-i] 



n Qj+iixj,dxj+i) 
j=P+i 



j=p+i 



Thus 



■Hp Dp.n{dxo:p+i,dxn) _ ?7pVi -Dp+i,„ (rfa^Oip+i , 

_ Tlp+iDp+i^Mxo..p+i,dxn) „(da;o:p+i,da;„) 



{dxp+i (1 ) (a:^p+i ) 



(7.10) 



Vp+lQp+l,n{^) 

^^i(rfa;p+i)Qp+i,n(l)(a;p+i) ' 



Qp+l,n('^p+l 5 dXn) 
_,„(l)(Xp+i) 



In the first Hue, variables Xp+2:n-i of the measures ripDp j^{dxo;n) and rip+iDp__^_i j^{dxo;n) are inte- 
grated out while the second line follows from (|7.9p . Using (j7.10[) . the term (j7.7p can be expressed 
as 



n-l 

E 

p—k 



Vp+i{dxp+i)Qp+i,„il)ixp+i) ??p+i('ixp+i)Qp+i,„(l)(a:p+i) 



'7pVi'9p+i,«(l) 



Note that by 



(Eg) and (ESI), 



(xp+i) 



p+l,n I 'Pn 



(2^p+l) 



</3 



J Qp-\-l.n (*^p+l 5 (^^n ) 

V Qp+i,n(i)(a;p+i) 



(a^P+i) 



< 



C/3 {Mp''^,...M^_,,) 



Thus by (|7.2p and Lemma 17.61 we conclude that there exists a finite constant Cr (depending only 



p—k 



tk {xk-l,Xk) 



^n{Xn) 



Vp Dp,nidX0:n) _ Vp+lDp+l,nidxO:n) 



<{n- k)CrT (7.11) 
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For the term (fTSj) . it follows from ([7111)) 

Vp+l {dXp+l )Qp+l,n ( 1) ) 



V?+l (Qp+l.n(l).M^+ifa)) ' 
Vp+lQp+l.n{'^) 



Thus, using p.3p and (j7.3p . there exists some non-random eonstant C such that the following bound 
holds almost surely for all integers k < p < n, N: 



Combine this bound with Lemma 17.31 to conclude that there exists a finite (non-random) constant 
Cr (depending only on r) such that for all integers k < p < ji, N: 



NEi 



(7.12) 



The resuh now follows from (fTTT]) and (fTJ^ . 



Lemma 7.9 Assume (A ). For any r > 1 t/iere exists a constant Cr such that for all 9, y, < k < n, 
N >1, ipn e Osci{X), 



Ve,kDe.k,nidxo:n) 



tg^k ixk-l,Xk) ipniXn) 



h,nidX0:n)te,k {xk-l,Xk) (ifniXn) " '7e,n(</'n)) 



i — k 



(7.13) 



Proof: 



i{dxo.,n)tk {xk^i,Xk) (ipniXn) ~ Vni^fin)) 
(v^Dl^{dX0:n) 



To study the errors, term (|7.14p may be decomposed as 
'r^^D^^^Xdxo-.n 



- Q,i(d.TO:„) tk {xk-l,Xk) {(finiXn) - ??n('^n)) (7.14) 



i{dxQ.,n) tk{xk-l,Xk){ip„{Xn) ~ rini^n)) 



p=0 
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with the convention that rj^ = $o {V-i) ^ Vo- The term corresponding to p = k can be expressed as 



7T\ ^F7i ^'^k [Xk,dXk-l)tk {Xk-l,Xk) PkA'Pn-Vn{V>n)){Xk) 

Using Lemma mi and Remark 17.21 



™ , , ^'^ i<^^^JQ>^Ai)ix^ _ Vk idxk)QkAi)ixk) \ ^(^^ _ 

Similarly, the pth term when p < k can be expressed as 

^ <I?,^„(1) v^D^Jl) ) ^^'-'^^'^ - 

$p,fc-i(f7^)(rfa;fc-i)Qfc-i,n(l)(a;fc-i) $p,fc-i(?7^)((ia;fc_i)(3fe_i,„(l)(a;fc_i)\ 



$p,fe-l«)Qfc-l,n(l) 

(3fc(a;/c-i,rfa;fc)(3fc,n(i)(a;fe) 



<i>p,fe_i(^7^)Qfc_i,„(l) 

tfc (Xfc_i,Xfc)PA:,„ {(fin - rjni'fn)) (xk) 



Qfc-i,n(l)(a:fc-i) 

Using Lemma [7^ for the outer integral (recall <^p^k-i{rjp) ~ ^p-i,k-i{ilp-i))j 



^ — n — k — k—l — p 

Combining both cases for p yields 

'v^Dl^Xdx,:n) 



% (<-l)^p^n(l) 



tk {xk^l,Xk) {^Pn{Xn) - Vni^n)) 



NE 



k-1 



- Q„{dxO;n) tk {xk-l,Xk) {(Pn{x„) - Vni^n)) 



-n^k 



(7.16) 



p=0 



1 



1-P 



For (|7.15p , Lemma 17.51 yields the following estimate 



V^kDUVn)\ V^D^tk) 



< Crlf 



(7.17) 



The proof is completed by summing the bounds in (|7.16l) . (|7.17l) and inflating constant Cr appro- 
priately. 
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7. 1 Proof of Theorem [SH] 



n „ 

Cn i'Pn) - Cniifin) = ^ Qn {dXQ..n)tk [xk-l,Xk) {^n{Xn) " Vn {fn)) 
k=0'' 



•^k— 1 : '^k ) i^niXn) - Vniy^n)) ■ 

To prove the theorem, it will be shown that the error due to the A:-th term in this expression is 



NE 



(dxO:n)tk ixk-l,Xk) {(finiXn) " Vn i'fin)) 



}n{dX0:n)tk ixk-l,Xk) {(fniXn) - Vni.^n)) 



< {n-k+l)CrP 



—n — k 



where constant Cr depends only on r and the bounds in Assumption (A) (through the estimates p 
and in (|7.2p as well as the bounds on the score). 

{dX0:n)tk {xk-l,Xk) {ipn{Xn) - Vn ifn)) - J Qn{dxo.,n)tk {xk-l,Xk) {fn{Xn) " Vnifn)) 
'•.n {dxO:n)tk {xk-l,Xk) {(Pn{Xn) " Vn ifn)) 



tk {Xk-l,Xk) I (pn[Xn) - „jv r,N 



vi^DZnil) 



tk {xk-l,Xk) iPn{Xn) 



(7.18) 



(7.19) 



}n{dX0;n)tk ixk-l,Xk) {(Pn{Xn) - Vni^n)) 



The proof is completed by summing the bounds in Lemma 17.81 for (j7.18p and Lemma 17.91 for (j7.19p 
and inflating constant Cr appropriately. 

7.2 Proof of Theorem EH 

The following result which characteri zes the asymptotic behavior of the local sampling errors defined 
in dsn is proved in IPel Moral |2004L Theorem 9.3.1] 

Lemma 7.10 Let {<p„}n>o C B{X). For any 0, y, n > 0, the random vector {Vg^Q{Lpo), . . . , V'/^ ((/?„)) 
converges in law, as N ~^ oo, to {Vgfi{ipo), . . . , Vo^ni^n)) where Vg^i is defined in 13 



The following multiva riate fluctuation th eorem first p roved under slightly different assumptions 
Del Moral etall j2010l | is needed. See also lDouc et al.l t2jJD9] for a related study. 



Theorem 7.11 Assume (A). For any 9,y,n> 0, Fn S B{X''+^),VN 
in law, as N ^ oo, to the centered Gaussian random variable 

De.,p.n{Fn-QeAFn)) 



e,n I (Fn) converges 



Vg^p i Gg^p^, 

p=0 



^e,p,n(l) 



where Vg^p is defined in ^3.4^ . 
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Proof: 

Let 

n-l 

7n = n ^k{g{yk\ ■)) 

and define the unnormalized measure 

r„ = JnQn- 

The corresponding particle approximation is =JnQn where 7,^ — nl-=o^?fc^(5(yfc| •))■ The result 
is proven by studying the limit of \/N (T^ — r„) since 



[Qi^ - QnKFn) = 4v [^n ^ ^n] (F„ - Q„(F„)) ■ 



Note that Lemma [7.41 implies 7,^ converges almost surely to 7„. The key to studying the limit of 
Vn (F^ — r„) is the decomposition 



N [F^ - F„] (F„) = J2 < ^p"" iDpAFn)) + (F„) 

p=0 



where the remainder term is 



< (Fn) ■■= J2 ^P i^pZ^) and the function ^;^„ [D^^^ - D,,^] (K) 

p=0 



By Slutsky's lemma and by the continuous mapping theorem (see Ivan der VaartI [1998j) it suffices to 
show that {Fn) converges to 0, in probability, as — > 00. To prove this, it will be established 
that E (i?^(F„)2) is O(iV-i). Since 



n 



p=0 

_ cliiiiuaL ouiCi_y, wiicic 



and |7p I < Cp almost surely, where Cp is some non-random constant which can be derived using (A), 
it suffices to prove that E ( {F^nYj is 0{N^^). By expanding the square one arrives at 



Vp^ {Fp^n) I F^-i < $P iv^-i) {F^,n 



By Assumption (A), for any Xp_i G A", 

% {v^-i) {{FZ.f) <P'J dxp f{xp\xp.,) Fjyjxp)^ 

By Lemma [771 E {Vj? (F^^Y) is 0{N~^). 

The next lemma is needed to quantify the variance of the particle estimate of the filter gradient 
computed using the path-based m ethod. Note that this lemm a does not require the hidden chain to 
be mixing. We refer the reader to Id5" Moral and Miclol {200 ij for a propagation of chaos analysis. 

For any 6, y = {yn\n>o, let {Ve,n}n>o be a sequence of independent centered Gaussian ran- 
dom fields defined as follows. For any sequence of functions {F„ G S(<Y"+^)}„>o and any p > 0, 
{V6i.„(i^„)}fj^Q is a collection of independent zero-mean Gaussian random variables with variances 
given by 

■.n)\VO:n-l) ■ (7.20) 
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Lemma 7.12 Let {Sg}g^Q C [1, oo) and assume Sg^ < g0{y\x) < Sg for all {x,y,9) € XxyxQ. For 
any 9, y, n> 0, F„ G S(A'"+^),-\/iV' [p^ {dxQ;n\yo:n-i) — Qo.n) (Fn) converges in law, as N oo, to 
the centered Gaussian random variable 

n 
p=0 

where Gg^p^n was defined in |i. ?[ ) and 

Fe,p,n ~ ^e{F{Xo:n)\xO:p,yp+l:n-l) —Qe,n{Fn) 

7.2.1 Proof of Theorem [372] 

It follows from Algorithm 1 that 

= (ipnTn) - Q„(¥>„T„) + Q„((p„)Q„(T„) - (r„) (7.21) 

The second term on the right hand side of the equality can be expressed as 

Qn(¥'„)Q„(T„) - iipn)Q^ (r„) 

+ (Q^ {ipn) - QniVn)) (Q„(r„) - (r„)) . (7.22) 

Combining the two expressions in (|7.2ip and (|7.22p gives 

= {{ifn - Q„(^„)) (r„ - Q„(T„))) 

- Qn ii^n - Q„('P„)) (T„ - Q„(r„))) 

+ (Q^ (^„) - Q„(^„)) (Qn(T„) - {T„)) 

Using Lemma [7j4] with r = 2 and Chebyshev's inequality, we see that {Q^ {cpn) — Qn(<(2n)) converges 
in probability to 0. Theorem 1 7 . 11 1 can now be invoked with Slutsky's theorem to arrive at the stated 
result in p.Sp . 

Moving on to the uniform bound on the variance, let 

n 

T„-Q„(r„) = ^4, 

fe=0 

tk = tk - Qnitk), 
^ri = y^n - Qn(V'n)- 

Also, the argument of Vp can be expressed as 

,/ N _ Qp,n{'^)iXp) -A Dp,n {^Jk - Q« {(Pntk)) (Xp) 

VpQpA^) to DpAl){xp) 
It is straightforward to see that rip{(t)p) = 0. Therefore the variance (see l\'SA[ ) now simplifies to 

Dp.njFn - QnjFn)) 
p—0 ^ y ^ / p—Q 



Vp [ Gp,n — — — ^Y) ) = Vp{(f>p)- (7.23) 
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Consider the function (pp. For p < fc — 1, 



Dp^„{l){xp) 



^pQp,n (1) 
Qp,k-liXp,dXk-l)Qk-l.n{i){xk-l) 
Qp^n (1) (-^p) 

Qp,/c_i(a;^,da-A;_i)Qfe_i.„(l)(a;A;_i) 



Qfc_i,„(l)(xfc_i) 

Using the estimates in (|3.3p and (|7.2p . this function is bounded by 



Qp,ni^){Xp) 
tk{xk-l,Xk)Pk,ni'Pn){xk). 



sup 



Dp^n{l){Xp) 



(7.24) 



for some constant C. When p > k, 

Dp,n {^ntk - Qn {Vntk)) {Xp) 
Dp^n{l){Xp) 

'^'^'^tgpTw^'''^ (A^p(4)(x,)P,,„(^„)(a;,) - Mp(Jk){x'p)PpA^n){x'p)) 
Again using the estimates in p.3l) . (|7.2p and (|7.3p . 

-Dp,« - Qn ('^n^fc)) {Xp) 



sup 



-Dp,„(l)(xp) 



'—n—k 



(7.25) 



Combining (fTM)) and (fr25ll . 



sup |(/<p(xp)| < + Cp"-P-i(7i -p), 



< p < n. Combining this bound with (|7.23p will establish the result. 
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